Skip to main content

Self Hosted E-Receipts API

Introduction

Welcome to the BlinkReceipt Self Hosted API. This API allows you to take advantage of the full power of our E-Receipts API without any PII leaving your infrastructure.

Prerequisites

  • Install Docker
  • Contact your account representative for a starter Docker environment file that will be pre-filled with credentials specific to your account

Infrastructure Requirements

Before deploying the application, ensure the following infrastructure is provisioned and accessible to the containers:

  • PostgreSQL-compatible database instance

    • Recommended: Amazon RDS (Aurora PostgreSQL)
    • Create a database and a user with read/write access to it
  • Redis instance

    • Recommended: Amazon ElastiCache (Redis)
  • AmazonMQ for RabbitMQ Instance

    • Recommended: AmazonMQ with RabbitMQ engine
    • Instance Type: mq.t3.micro (minimum)
    • Engine version: 3.11 or later
    • Deployment mode: Single-instance (for development) or Cluster (for production)
    • Network access: Ensure your containers can connect to the broker endpoint
  • S3 Bucket for OTA data updates

    • The application must have read access to this bucket.
    • Actual's AWS account requires write access to push OTA updates.
  • IAM Role Configuration (AWS)

    • Attach IAM role(s) to your ECS tasks or EC2 instance profiles with the necessary permissions to access the S3 bucket, DB, Redis, and AmazonMQ.

System Overview

The application is designed to run entirely within containers and supports four core roles, all of which are encapsulated in the same Docker image and differentiated by the ROLE environment variable:

API Service (ROLE=API)

  • Exposes the main HTTP endpoints.
  • Requires connectivity to the database, Redis, and RabbitMQ.
  • Should be scaled based on expected request traffic.

Worker Service (ROLE=WORKER)

  • Processes background jobs from RabbitMQ (e.g., data processing, extraction tasks).
  • Should be scaled depending on job volume and desired latency.
  • Requires connectivity to Redis and RabbitMQ.

DB Migration Script (ROLE=MIGRATE_DB)

  • Should be run once with every deployment, as it is responsible for creating tables initially, and applying any subsequent migrations as new versions of the Docker image are released
  • Requires connectivity to the database.

OTA Update Cron (ROLE=UPDATE_CRON)

  • Run upon deployment and periodically (e.g., once per day).
  • Pulls updated data files from S3 and updates the database accordingly.
  • Requires connectivity to the database and S3.
  • Note: The updates are designed to be applied as a "hot swap" with minimal downtime, but using a DB like Aurora with read replicas will provide even more safety

Environment Variables

The application expects a number of environment variables to be set. Many of these will be pre-populated by your account rep, and many RabbitMQ configuration parameters are handled automatically by the Docker image. The variables you are expected to populate are:

Variable NameDefaultDescription
ROLEAPI - HTTP API service
WORKER - Extraction job processor
MIGRATE_DB - Apply db migrations
UPDATE_CRON - Script to check for and apply OTA updates
REDIS_HOSTRedis server hostname
REDIS_PORT6379Redis server port
REDIS_USERRedis username (optional)
REDIS_PASSWORDRedis password (optional)
REDIS_TLSfalseWhether to use TLS to connect to Redis or not
REDIS_DB0The number of the Redis database to use
DB_HOSTPostgreSQL server hostname
DB_PORT5432PostgreSQL server port
DB_NAMEPostgreSQL database name
DB_USERNAMEPostgreSQL username
DB_PASSWORDPostgreSQL password
DB_SSLfalseControls whether the app's PostgreSQL connection uses SSL/TLS encryption
AWS_REGIONus-east-1AWS region for S3 and other services
DB_UPDATE_ROLE_ARNThe IAM role ARN for accessing the S3 bucket which will contain OTA DB updates
DB_UPDATE_S3_BUCKETThe name of the S3 bucket which will contain OTA DB updates
SCRAPE_WORKER_TIMEOUT_MS60000Timeout in milliseconds for each template to be processed
CPP_THREAD_POOL_SIZE50Maximum number of worker threads for handling concurrent requests
CPP_RETRY_MAX_ATTEMPTS10Maximum number of retry attempts before a request fails
RABBITMQ_MESSAGE_EXPIRE_MS60000Number in ms how long until an unacked message expires in a queue. This should have the same value as SCRAPE_WORKER_TIMEOUT_MS
RABBITMQ_TEMPLATE_BATCH5The number of templates processed per batch. Increasing this will require more memory for each worker. We approximate that each template per batch costs ~300mb of memory
RABBITMQ_URLamqp://localhost:5672RabbitMQ connection URL (use amqps:// for TLS)
NODE_MEMORY1900This is used to set the NODE_OPTION max-old-space-size
SCRAPE_QUEUEscrape_queueName of the queue where workers listen for templates to process
HTTPSfalseEnables HTTPS support when set to "true", requires CERT_CN
CERT_CNThe Common Name for the TLS certificate that the app will auto-generate
ENABLE_SCAN_APIfalseAllows extraction api to send receipts to OCR scanning service for specific merchants
OCR_SCAN_URL_ON_PREMBase URL of the OCR scanning service

AmazonMQ for RabbitMQ Setup

To set up AmazonMQ for RabbitMQ, follow these steps:

1. Create AmazonMQ Broker

  1. Navigate to AmazonMQ Console

    • Go to the AmazonMQ console in your AWS account
    • Click "Create broker"
  2. Configure Broker Settings

    • Engine type: RabbitMQ
    • Engine version: 3.11.x or later (recommended)
    • Deployment mode:
      • Single-instance for development/testing
      • Cluster for production (provides high availability)
    • Instance type: mq.t3.micro (minimum) or larger based on your throughput requirements
  3. Configuration

    • Broker name: Choose a descriptive name (e.g., ereceipt-extraction-broker)
    • Username and Password: Create credentials for the broker (you'll use these in RABBITMQ_URL)
  4. Connectivity

    • Virtual Private Cloud (VPC): Select the same VPC where your containers will run
    • Subnet(s): Choose appropriate subnets
    • Security groups: Create or select security groups that allow:
      • Inbound access on port 5671 (AMQP with TLS) or 5672 (AMQP without TLS)
      • Access from your container security groups

2. Configure Environment Variables

Once your AmazonMQ broker is created, you'll get an endpoint URL. Configure your environment variables:

# For TLS connection (recommended for production)
RABBITMQ_URL=amqps://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5671

# For non-TLS connection (development only)
RABBITMQ_URL=amqp://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5672

3. Security Group Configuration

Ensure your security groups allow:

  • Outbound from your container security group to AmazonMQ security group on port 5671/5672
  • Inbound to AmazonMQ security group from your container security group on port 5671/5672

4. Network Connectivity

  • If using public subnets: Ensure your AmazonMQ broker has public access enabled
  • If using private subnets: Ensure proper routing between your container subnets and AmazonMQ subnets
  • For cross-AZ deployment: Consider placing your broker in multiple availability zones for resilience

Running a container

Quick Start

  • Download this Docker Compose file
  • Make sure your .env.client is in your project folder and all vars are populated
  • Decide how you will authenticate to AWS and set env vars / modify docker-compose-ereceipts.yml accordingly:
    • Currently the docker compose file mounts your local ~/.aws folder into the relevant containers so that they can authenticate the same way you do locally
    • It also sets the AWS_PROFILE env var which you should delete if you want it to use your default config, or override with a profile value of your choice
    • If you prefer to authenticate using access token + secret key, then you can remove the ~/.aws mounts from all containers and instead set the appropriate env vars
  • Make sure to set your db-related env vars according to whatever values (user, pass, db name) you pass into the postgres container in docker-compose-ereceipts.yml

This should serve as a blueprint for how the service is orchestrated in production.

Testing the API

Once the container is running, you can make requests against http[s]://localhost:4001 as the domain. You can find the request and response structures in our API Spec.

Logging

Logs are written to stdout and can be collected or shipped as needed

Production Deployment

Seeding DB

Make sure to run the DB Migration Script and the OTA Update Cron for each environment upon deployment to ensure that the DB has the structure + data needed by the app to perform extraction

Optimizing Performance

For optimal performance in a Kubernetes or ECS deployment, we recommend starting with the following configuration:

API

  • vCPUs: 1
  • RAM: 1 GiB
  • Replicas: 6

Workers

  • vCPUs: 1.5
  • RAM: 2.5 GiB
  • Replicas: 50

Worker Env Vars

  • RABBITMQ_TEMPLATE_BATCH 5
  • NODE_MEMORY 1900

Expected throughput is ~1 rpm per worker, and average latency is ~5s. Scaling horizontally will improve throughput while maintaining the same latency characteristics.

Readiness & Health Checks

For orchestration such as Kubernetes that can make use of readiness and liveness probes, these are available at GET /readyz and GET /healthz respectively. These endpoints will return status code 200 for success and 500 otherwise.

Debugging

To debug data quality issues (i.e. wrong/missing fields), it is most helpful to provide us with the blinkReceiptId associated with the request as well as the email that was passed in so that we can attempt to reproduce.